Medical ( Thrombosis ) Data Description
نویسندگان
چکیده
Collagen diseases are ofter dangerous and can be lethal. A severe complication common to those diseases of auto-immune system is called thrombosis. It occurs when coagulation of blood clogs blood vessels. Data relevant to the analysis of patients with collagen diseases have been donated to the PKDD Discovery Challenge in the hope that the discovered knowledge will illuminate the mechanisms responsible for collagen diseases and will help to diagnose and predict attacks of thrombosis. Discovery Challenge at PKDD-99 in Prague brought preliminary results, but it seems that the data o er a potential for much more knowledge. We describe a number of improvements made to the data, and their role in pursuing knowledge useful to doctors. We also describe a number of challenges caused by unconventional values recorded by physicians. The enhanced data are available for PKDD-2000 Discovery Challenge. 1 Medical problems for knowledge miners' attention Collagen diseases are disorders of auto-immune system. Patients generate antibodies which attack their own bodies. That may result in a loss of life, when anti-bodies paralyze the organ where they develop. For example, if a patient generates anti-bodies in lungs, (s)he will chronically lose the respiratory function and nally will lose life. Little is known about the mechanisms responsible for those diseases and their classi cation is still fuzzy. Some patients may generate many kinds of anti-bodies and their manifestations may include all the characteristics of collagen diseases. In collagen diseases, thrombosis is one of the most important and severe complications and one of the major causes of death. Thrombosis is an increased coagulation of blood which clogs blood vessels. Usually it lasts several hours and can repeat. It has been found that this complication is closely related to anticardiolipin antibodies. This was discovered by physicians, one of whom donated the dataset to discovery challenge. Thrombosis must be treated as emergency. It is important to predict the possibility of its occurrence. It is also important to detect that it occurred and to capture temporal patterns speci c and sensitive to attacks of thrombosis. Doctors are moreover interested in classifying collagen diseases and in temporal patterns speci c and sensitive to each collagen disease. 2 The raw data: PKDD Challenge 1999 The Challenge data may be a source of answers to such questions. The data were collected at Chiba University Hospital. For the 1999 Discovery Challenge they were organized into three tables, that we named TSUM A.CSV, TSUM B.CSV, TSUM C.CSV (for simplicity we will skip the extensions .CSV). The tables can be connected by the ID number unique for each patient. Each patient rst came to the Hospital's Outpatient Clinic on collagen diseases, as recommended by a home doctor or a general physician in a local hospital. The primary data on the patient were recorded at that time. TSUM A table consisted of approximately 1240 records and contained that information. The table was de ned in detail by Tsumoto (1999). Besides ID the attributes included sex birthday, the rst date when patient's data were recorded, the date when the patient came to the hospital, whether the patient was admitted to the hospital or followed in the outpatient clinic. The last attribute was DIAGNOSIS. This was a multi-valued attribute and upon closed examination the values turned out to belong to several categories, only some directly related to collagen diseases. In section on multi-valued attributes we will discuss the treatment of DIAGNOSIS. The table TSUM B included special results obtained in the Laboratory on Collagen Diseases. The data were input by doctors. They only include the patients who underwent those special tests. The data include patient ID, examination date, concentrations of three anti-cardiolipin antibodies (IGG, IGM, IGA), anti-nucleus antibody concentration (ANA), ANA patterns (a multi-valued attribute), three measures of degree of coagulation (KTC, RVVT, LAC). One attributes described degree of thrombosis, while two other multi-valued attributes described diagnosis and symptoms. The problems with multi-valued attributes are similar to diagnosis in TSUM A. The examination date was frequently close to the date of thrombosis, but upon closer inquiry it turned out that some of the patients su ered multiple attacks of thrombosis, and many of relevant data were not included in TSUM B. We will present the new data in section on data enhancements. The third table, TSUM C, included ordinary laboratory examinations, one record per one date of the tests. Distinct attributes permit storage of values of 42 speci c tests recorded. ID is a foreign key to TSUM A and TSUM B. Many records with dates that stretch over a long time are available on some patients, raising a possibility of time-series analysis. Backround knowledge available on attributes in TSUM C included the range of normal values of each test and the meaning of each test described in one or a few words. 3 Brief history of the challenge on thrombosis data Three contributions were made to the September 1999 Prague challenge chaired by Petr Berka (Beilken & Spenke, 1999; Levin et al. 1999; Taylor, 1999). Also in September 1999, four contributions were made to a workshop in Japan chaired by Shusaku Tsumoto (Ichise & Numao, 2000; Nakamoto, Yoshida & Suzuki, 2000; Negishi, Suyama & Yamaguchi, 2000; Tsukada, Inokuchi, Washio & Motoda, 2000). The contributions are interesting, but results are preliminary. The most interesting results were obtained by Beilken and Spenke's Infozoom, which captures not only reasonable rules from TSUM B, but also very interesting temporal patterns of laboratory tests before the thrombosis episode. Compared with their results, although other rule induction methods obtain reasonable results from TSUM B, they are not capable to induce interesting temporal patterns. 4 Data enhancements: PKDD Challenge 2000 The past challenges demonstrated that multi-relational and multi-valued data are di cult for knowledge miners. Tools are not available and problems go beyond traditional tasks of KDD. Many problems are presented by string-valued attributes. Upon closer inspection, additional data can provide new information essential for doctors' business problems. 4.1 Odd values; string values Consider values such as \> 107" for the predominantly numerical attribute PLT in TSUM C. Many attributes in TSUM C include such values. They are allowed since data types are strings rather than numbers. We can understand the convenience of the value \> 107" when the test is not exact. But this value is hard to compare with numerical values. String values allow neither the use of number ordering, nor other numerical relations. Unfortunately, there is no quick solution. The normal values of PLT are between 100 and 400, so we can include \> 107" into the normal range, but any detailed number assignment may cause signi cant error. On the other hand a combination of numerical and non-numerical values impedes the use of many knowledge discovery tools. Reluctantly, the number 108 was entered as replacement for \> 107", and the same convention was used in other cases. The original data are also available, so that any Challenge participant can revise the homogenization policy. String format caused other problems, too. It is vulnerable to misspelled values, di erent spacing in disease names, and other non-essential changes. While some values could be easily identi ed by commas ("SLE, PM, PSS"), many cases required help from database provider, for instance "ANA $BM[@-$N$_ (B" and "Spleen infarction+R[-784]C, PH,thrombophlebitis" The same diagnosis occurred under di erent names, such as CHRONIC EB CHRONIC EB VIRUS INFECTION
منابع مشابه
Neonatal Thrombosis: Incidence and Risk Factors in a Tertiary Care Hospital in Iran
Background: Neonatal thrombosis is one of the most important challenges among patients admitted in Neonatal Intensive Care Unit (NICU), which can lead to an asymptomatic condition, limb loss or even death. This study was performed to determine the incidence and risk factors of neonatal thrombosis in a tertiary care hospital in Tehran, Iran. Material and Methods: In this historical cohort, all n...
متن کاملSuccessful Use of Two Thrombolytic Drugs in Prosthetic Mitral and Aortic Valve Thrombosis
Introduction: Prosthetic valve thrombosis is a rare and severe complication of valve replacement, most often encountered with a mechanical prosthesis. The significant morbidity and mortality associated with this condition warrant rapid diagnostic evaluation. Although surgery is the first-line therapy in symptomatic obstructive mechanical valve thrombosis, thrombolytic therapy has been used as a...
متن کاملDeep Vein Thrombosis, Pulmonary Embolism and Related Factors in Patients with Traumatic Brain Injury
Background and Objective:Deep vein thrombosis and pulmonary embolism are fatal problems following brain trauma that, if left untreated, can dramatically increase mortality. Therefore, the present study aimed to evaluate deep vein thrombosis, pulmonary embolism and related factors in patients with traumatic brain injury. Materials and Methods:This cross-sectional study was performed on 38 patie...
متن کاملThe PKDD Discovery Challenges on Thrombosis Data
The aim of the Discovery Challenge workshops held during PKDD conferences is to encourage a collaborative research effort when analyzing real world data. For PKDD’99 and PKDD2000 two data sets were available; from the financial and from the medical domain, for PKDD2001 only the medical data are used. There are two basic types of contibutions to the Challenge; in “method oriented” papers the aut...
متن کاملHypertrophic osteopathy associated with aortic thrombosis in a dog
Background: Aortic thrombosis (ATh) is an uncommon problem in dogs. Although the pathogenesis of hypertrophic osteopathy (HO) is unknown, it is thought this can be due to blood flow disorder. In this case, removal of aortic thromboembolism (ATE) resulted in periosteal proliferation. Case description: A 4.8-kg, 3-year-old, intact female Maltese was referred with...
متن کاملTreatment of deep vein thrombosis from Iranian traditional medicine and modern medicine points of view: comparative study
Background and objectives: Recently, deep vein thrombosis (DVT) has remained a major cause of morbidity and death. DVT is a serious public health issue and can be complicated by pulmonary embolism and stroke leading to high economic burden. Disease etiologies may include protein C deficiency, protein S deficiency, factor V Leiden gene mutation, prothrombin gene 20210A mutation,...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000